Goto

Collaborating Authors

 game theory


AITesting Should Account for Sophisticated Strategic Behaviour

Neural Information Processing Systems

This position paper argues for two claims regarding AI testing and evaluation. First, to remain informative about deployment behaviour, evaluations need account for the possibility that AI systems understand their circumstances and reason strategically. Second, game-theoretic analysis can inform evaluation design by formalising and scrutinising the reasoning in evaluation-based safety cases. Drawing on examples from existing AI systems, a review of relevant research, and formal strategic analysis of a stylised evaluation scenario, we present evidence for these claims and motivate several research directions.


GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

Neural Information Processing Systems

We propose GAM-Agent, a game-theoretic multi-agent framework for enhancing vision-language reasoning. Unlike prior single-agent or monolithic models, GAM-Agent formulates the reasoning process as a non-zero-sum game between base agents--each specializing in visual perception subtasks--and a critical agent that verifies logic consistency and factual correctness. Agents communicate via structured claims, evidence, and uncertainty estimates. The framework introduces an uncertainty-aware controller to dynamically adjust agent collaboration, triggering multi-round debates when disagreement or ambiguity is detected.


Procurement Auctions with Predictions: Improved Frugality for Facility Location

Neural Information Processing Systems

We study the problem of designing procurement auctions for the strategic uncapacitated facility location problem: a company needs to procure a set of facility locations in order to serve its customers and each facility location is owned by a strategic agent. Each owner has a private cost for providing access to their facility (e.g., renting it or selling it to the company) and needs to be compensated accordingly. The goal is to design truthful auctions that decide which facilities the company should procure and how much to pay the corresponding owners, aiming to minimize the total cost, i.e., the monetary cost paid to the owners and the connection cost suffered by the customers (their distance to the nearest facility). We evaluate the performance of these auctions using the frugality ratio. We first analyze the performance of the classic VCG auction in this context and prove that its frugality ratio is exactly 3. We then leverage the learning-augmented framework and design auctions that are augmented with predictions regarding the owners' private costs. Specifically, we propose a family of learning-augmented auctions that achieve significant payment reductions when the predictions are accurate, leading to much better frugality ratios. At the same time, we demonstrate that these auctions remain robust even if the predictions are arbitrarily inaccurate, and maintain reasonable frugality ratios even under adversarially chosen predictions. We finally provide a family of "error-tolerant" auctions that maintain improved frugality ratios even if the predictions are only approximately accurate, and we provide upper bounds on their frugality ratio as a function of the prediction error.


Online Learning in the Repeated Mediated Newsvendor Problem

Neural Information Processing Systems

Motivated by real-life supply chain management, we study a repeated newsvendor problem in which the learner is a mediator that facilitates trades between suppliers and retailers in a sequence of supplier/retailer interactions. At each time step, a new supplier and retailer join the mediator's platform with a private production cost and utility function, respectively, and the platform proposes a unitary trading price. The supplier accepts the proposed price if it meets or exceeds their unitary production cost and communicates their decision to the platform; simultaneously, the retailer decides the quantity to purchase at the proposed trading price based on their private utility function and sends their decision to the platform. If the supplier accepts the trading price, the transaction proceeds, and the retailer purchases their chosen quantity of units, paying the product of this quantity and the trading price to the supplier. The mediator's objective is to maximize social welfare. We design an online mediator's pricing strategy that features sharp regret rates under some natural assumptions, and we investigate the necessity of these assumptions, proving that relaxing any of them leads to unlearnability.


Theoretical Guarantees for the Retention of Strict Nash Equilibria by Coevolutionary Algorithms

Neural Information Processing Systems

Most methods for finding a Nash equilibrium rely on procedures that operate over the entire action space, making them infeasible for settings with too many actions to be searched exhaustively. Randomised search heuristics such as coevolutionary algorithms offer benefits in such settings, however they lack many of the theoretical guarantees established for exhaustive methods such as zero-regret learning. We address this by developing a method for proving necessary and sufficient conditions for a coevolutionary algorithm to be stable, in the sense that it reliably retains a Nash equilibrium following discovery. As the method provides bounds that are adapted to both application and algorithm instance, it can be used as a practical tool for parameter configuration. We additionally show how bounds on regret may be deduced from our results and undertake corresponding empirical analysis.


The Burden of Interactive Alignment with Inconsistent Preferences

Neural Information Processing Systems

From media platforms to chatbots, algorithms shape how people interact, learn, and discover information. Such interactions between users and an algorithm often unfold over multiple steps, during which strategic users can guide the algorithm to better align with their true interests by selectively engaging with content. However, users frequently exhibit inconsistent preferences: they may spend considerable time on content that offers little long-term value, inadvertently signaling that such content is desirable. Focusing on the user side, this raises a key question: what does it take for such users to align the algorithm with their true interests? To investigate these dynamics, we model the user's decision process as split between a rational "system 2" that decides whether to engage and an impulsive "system 1" that determines how long engagement lasts.


Efficient Last-Iterate Convergence in Solving Extensive-Form Games

Neural Information Processing Systems

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR+, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR+ to solve perturbed regularized EFGs, we get Reward Transformation CFR+ (RTCFR+). Importantly, we extend prior work on the parameter-free property of CFR+, enhancing its stability, which is vital for the empirical convergence of RTCFR+. Experiments show that RTCFR+ exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence.



Pareto Optimal Risk Measure Agnostic Distributional Bandits with Heavy-Tail Rewards

Neural Information Processing Systems

This paper addresses the problem of multi-risk measure agnostic multi-armed bandits in heavy-tailed reward settings. We propose a framework that leverages novel deviation inequalities for the 1-Wasserstein distance to construct confidence intervals for Lipschitz risk measures. The distributional LCB (DistLCB) algorithm is introduced, which achieves asymptotic optimality by deriving the first lower bounds for risk measure aware bandits with explicit sub-optimality gap dependencies. The DistLCB is further extended to multi-risk objectives, which enables Pareto-optimal solutions that consider multiple aspects of reward distributions. Additionally, we provide a regret analysis that includes both gap-dependent and gap-independent bounds for multi-risk settings. Experiments validate the effectiveness of the proposed methods in synthetic and real-world applications.


Stable Matching with Ties: Approximation Ratios and Learning

Neural Information Processing Systems

We study matching markets with ties, where workers on one side of the market may have tied preferences over jobs, determined by their matching utilities. Unlike classical two-sided markets with strict preferences, no single stable matching exists that is utility-maximizing for all workers. To address this challenge, we introduce the Optimal Stable Share (OSS)-ratio, which measures the ratio of a worker's maximum achievable utility in any stable matching to their utility in a given matching. We prove that distributions over only stable matchings can incur linear utility losses, i.e., an Ω(N) OSS-ratio, where N is the number of workers. To overcome this, we design an algorithm that efficiently computes a distribution over (possibly non-stable) matchings, achieving an asymptotically tight O(logN) OSS-ratio. When exact utilities are unknown, our second algorithm guarantees workers a logarithmic approximation of their optimal utility under bounded instability. Finally, we extend our offline approximation results to a bandit learning setting where utilities are only observed for matched pairs. In this setting, we consider worker-optimal stable regret, design an adaptive algorithm that smoothly interpolates between markets with strict preferences and those with statistical ties, and establish a lower bound revealing the fundamental trade-off between strict and tied preference regimes.